feat(grpc,rest): gRPC + grpc-gateway REST surface for public APIs (Phase 2)#616
feat(grpc,rest): gRPC + grpc-gateway REST surface for public APIs (Phase 2)#616lakhansamani wants to merge 1 commit into
Conversation
…ase 2)
Lands the v1 public-API surface over gRPC and REST while keeping the
existing GraphQL surface untouched. All nine proto services are authored
and registered with the gRPC server. MetaService.GetMeta is fully
implemented end-to-end as the vertical-slice proof; the other eight
services are registered as Unimplemented stubs so gRPC reflection shows
the complete API surface and REST `/v1/*` returns 501 for not-yet-migrated
ops. Subsequent PRs migrate ops out of internal/graphql into internal/service
one service at a time, swapping out the stubs.
Proto (proto/authorizer/*/v1):
- meta/v1, user/v1, session/v1 (+ magic_link_service)
- verification/v1 (email_verification, password_reset, otp_challenge)
- token/v1, authz/v1
- All response types are RPC-unique (buf STANDARD lint enforces this).
- google.api.http annotations drive REST mapping; no `:verb` paths.
- buf.validate.field on every Request; protovalidate interceptor enforces.
- Custom options: required_permissions, mcp_tool, audit_log, public.
Toolchain (proto/buf.gen.yaml):
- Added grpc-gateway plugin (Go REST stubs alongside gRPC).
- Added openapiv2 plugin → gen/openapi/authorizer.swagger.json (single
merged spec, served at /openapi.json).
gRPC server (internal/grpcsrv):
- server.go: registers all 9 services, gRPC reflection (gated on
--enable-grpc-reflection), gRPC health checking (grpc.health.v1.Health),
graceful shutdown.
- interceptors: recovery (panic → codes.Internal), logging (one structured
line per RPC), validate (protovalidate, request → codes.InvalidArgument).
- handlers/meta.go: MetaService.GetMeta delegates to service.Meta.
- handlers/stubs.go: UserService, SessionService, MagicLinkService,
EmailVerificationService, PasswordResetService, OtpChallengeService,
TokenService, AuthzService all return codes.Unimplemented.
- transport/grpc_metadata.go: gRPC metadata ↔ service.RequestMetadata
bridge; honors grpcgateway-* headers so REST and gRPC see the same
host/IP/UA/auth header.
REST gateway (internal/gateway):
- mount.go: serves grpc-gateway via in-process bufconn dial to the same
grpc.Server — no extra TCP hop, no TLS plumbing.
- JSONPb marshaler configured with UseProtoNames=true so REST payloads
match the existing GraphQL snake_case shape.
- Mounted at /v1/* under the existing gin router (shares CORS, security
headers, rate limit, logger middleware automatically).
- /openapi.json serves the merged swagger doc.
CLI flags & wiring (cmd/root.go, internal/config/config.go):
- --grpc-port (default 8081), --enable-grpc-reflection (default true)
- --grpc-tls-cert / --grpc-tls-key / --grpc-insecure (TLS plumbing
placeholders; current build runs cleartext for parity with HTTP).
- grpcsrv.Server constructed alongside http_handlers; server.Run starts
HTTP, metrics, AND gRPC listeners with shared graceful shutdown.
Service migration (one op):
- internal/service/meta.go: Meta migrated out of internal/graphql.
- internal/graphql/meta.go now delegates (same pattern as SignUp).
Tests:
- TestGRPCMeta: bufconn end-to-end through MetaServiceClient.GetMeta.
- TestRESTMeta: gateway translates GET /v1/meta to the same gRPC call.
- Full SQLite integration suite (65s) still green.
Follow-up PRs migrate remaining 17 public ops + replace stubs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lakhansamani
left a comment
There was a problem hiding this comment.
Principal-engineer review. This is a strong architectural cut — proto/handler separation is clean, the in-process bufconn gateway is a nice latency-saver, and using Unimplemented* stubs to register the full surface up front is the right call. The big concerns are around TLS not actually being wired, a port-collision default, and secrets in URL paths. Detail below.
Strengths
- Proto layout: per-package files (meta/user/session/verification/token/authz) with shared types in
common/v1, AIP-style resource naming (users/me,sessions/me), oneof forCreateSession.grant,FieldMaskforUpdateUser. Request/response uniqueness is preserved per buf STANDARD lint. Custom options forpublic,audit_log,required_permissions,mcp_toolare well-scoped and forward-compatible. - buf toolchain: managed mode + per-module disables for protovalidate/googleapis, merged openapiv2 with
json_names_for_fields=falseso the swagger uses the same snake_case names.require_unimplemented_servers=falseis intentional and matches the embedding strategy instubs.go. - gRPC server: interceptor ordering
recovery → logging → validateis correct (recovery outermost). Recovery captures stack to server logs but only surfacescodes.Internalto clients. Validate is built once at startup (protovalidate.New) and reused. Logging chooses level by code (Error for Internal/Unknown/DataLoss, Warn for everything non-OK). - REST gateway: in-process bufconn dial avoids a real loopback hop and TLS plumbing for the gateway leg. Mounting under gin means CORS, security headers, rate limit, CSRF, logger, metrics middleware all apply automatically (verified end-to-end in
http_routes.go). - Transport bridge: honoring
grpcgateway-*prefixed headers inMetaFromGRPCis the right pattern, and falling back to:authorityfor HostURL is a nice touch. - Vertical slice:
MetaService.GetMetaexercises proto → handler → service → response projection end-to-end, with bothTestGRPCMeta(bufconn) andTestRESTMeta(gateway through gin). The migration pattern proven here is repeatable.
Issues
Blocking
- TLS flags are dead code (
cmd/root.go:112-114,internal/grpcsrv/server.go:53).--grpc-tls-cert,--grpc-tls-key,--grpc-insecureare wired intoConfigbutgrpc.NewServer(...)is called with nogrpc.Credsoption and there is no startup check that refuses to start without TLS material whenGRPCInsecure=false. The config comment lies ("When unset and GRPCInsecure is false, the server refuses to start") — there is no such enforcement anywhere. Default behavior: the server boots cleartext gRPC regardless of--grpc-insecure. Either implement the TLS path (credentials.NewServerTLSFromFile, refuse to start without it) or — at minimum — flip--grpc-insecuretotruedefault with a loudlog.Warnand update the config doc to say "placeholder, TLS not yet wired". The PR description does call this out as "TLS plumbing placeholders" but the flag UX implies hard enforcement. - Port collision:
--grpc-portand--metrics-portboth default to 8081 (cmd/root.go:89, 110). The collision check oncmd/root.go:347only compares HTTP↔Metrics; it does not include GRPC. On a fresh install both ports are 8081 → one ofServecalls fails to bind (silent goroutine error log only). Add--grpc-portto the existing check and pick a different default (e.g. 8082). gen/openapi/authorizer.swagger.jsonis not shipped in the production image.DockerfileonlyCOPYscmd/,internal/,web/templates, and the built web assets — nevergen/openapi/. The handler athttp_routes.go:94reads the file via the relative pathgen/openapi/authorizer.swagger.jsonfrom CWD. In production/openapi.jsonwill always return 404. Fix options: embed via//go:embed gen/openapi/authorizer.swagger.json(no file-system dependency, no Dockerfile change needed) — strongly preferred. Otherwise add theCOPYand use a configurable absolute path.- Secrets in URL paths. Multiple methods route secrets through the path component, which means they land in access logs, ingress logs, proxy logs, OS kernel listen-backlog dumps, browser history, Referer headers:
DELETE /v1/refresh-tokens/{refresh_token}(token_service.proto:29)PUT /v1/email-verifications/{token}(email_verification_service.proto:34)PUT /v1/password-resets/{token}(password_reset_service.proto:28)PUT /v1/otp-challenges/{challenge_id}(otp_challenge_service.proto:29) — thechallenge_iddoc says "today: the email / phone the OTP was sent to", so the user identifier goes in the URL. PII at minimum, enumeration vector at worst.
Move tokens/OTPs into the request body. RFC 7009 (POST /oauth/revoke) and RFC 6749 (POST /oauth/token) both put credentials in the body for exactly this reason. For revoke this also means switching fromDELETEto a custom verb (POST /v1/refresh-tokens:revokeper AIP-136) orPOST /v1/refresh-tokens/revocationsbody-bearing.
Important
- Reflection defaults ON in production.
--enable-grpc-reflectiondefaults totrue(cmd/root.go:111). Reflection leaks the full API surface (including admin-adjacent methods once they migrate) to anyone who can reach the gRPC port. Default it to off, and have ops opt in for staging/dev. The CLAUDE.md project rules already markEnableGraphQLIntrospectionas a deploy-time concern; treat reflection the same way. gateway.bufSize = 1 << 20is fragile. The comment claims "large enough that in-process gateway calls never block". Not true —bufconnblocks once N writer-side bytes exceed the buffer until a reader drains it. A single REST request body larger than 1 MiB will stall the in-process call. The existing HTTP server enforcesMaxHeaderBytes: 1 << 20but does not cap body size, so a 5 MB POST through/v1/sessionsis reachable. Either bump bufSize to ~16 MiB or align it with a documented max body cap (gin Limit reader middleware on/v1/*).- Cookies are not extracted in
MetaFromGRPC(internal/grpcsrv/transport/grpc_metadata.go). When REST callers hit/v1/sessions/me, the gateway forwards the browserCookieheader as thegrpcgateway-cookiemetadata key. The transport bridge never reads it, so service methods that look atmeta.Cookies(every authenticated path) will see an empty slice. Also:meta.Requestis unset for gRPC callers, sosignup.go:329'sgcShim := &gin.Context{Request: meta.Request}will deref nil the moment any service method invoked via gRPC reaches that code path. This is fine forGetMetabut will explode in PR #4. Recommend: parseCookieheaders into[]*http.CookieinMetaFromGRPCand synthesize a minimal*http.Request(URL, Header, Host) so the gcShim pattern survives untilTokenProvideris refactored. ApplyToGRPCdrops all but the first Set-Cookie (grpc_metadata.go:61). The//nolint:staticcheckcomment acknowledges this.metadata.Pairs("Set-Cookie", a, b, c)produces a single comma-joined value, which is wrong for cookies (commas are valid in cookie values). The correct pattern ismetadata.MD{"set-cookie": values}andgrpc.SendHeaderonce with the full MD. As written, a service method that sets both the session cookie and the MFA cookie will silently lose one through the gateway.- No server-side keepalive / max message size / connection limits.
grpc.NewServeris called with only the interceptor chain. Nogrpc.KeepaliveEnforcementPolicy(clients can pummel withPINGframes), noMaxConcurrentStreams, noMaxRecvMsgSizeoverride (defaults to 4 MiB but worth explicit). For a public-facing server these matter; the HTTP listener already sets explicit timeouts — match that here. - Stub services advertise unimplemented methods on the wire. Every service is registered and reflection lists them. A client doing capability discovery will see e.g.
CreateUserand try it, gettingUnimplemented. That's the trade-off you took (it's documented instubs.go) and it's defensible — but it does mean a write call toPOST /v1/usersreturns 501 with no audit log, no rate-limit accounting, no observability. Worth adding a single registry/list of "unimplemented surface" so an operator can--disable-unimplementedif they don't want the noise. grpcsrv.Server.Runcallssrv.Serve(lis)for the public TCP listener, butgateway.Handleralso callsServeon the same*grpc.Serveragainst bufconn (server.go:73 vs server.go:108). MultipleServecalls on a*grpc.Serverare supported (s.lisis a map), but the ordering matters:Runschedules the TCP serve in a goroutine aftergateway.Handleralready started bufconnServe. If gateway.Handler errors (it shouldn't, but if it does), the public gRPC listener never starts. Not blocking, but the dependency would be more obvious if grpcsrv owned listener-startup ordering and the gateway only got a*grpc.ClientConn(or aDial(bufnet)callback).http_routes.go:88:router.Any("/v1/*path", gw)shadowing collision risk — once_authz_admin GraphQL prefixes (per recent commits) or any future/v1/...non-gateway routes are added under gin, they'll be silently overridden by the gateway catch-all. Add a guard test that no other gin route starts with/v1/.
Nit
- The
signup.gogcShimpattern is acknowledged with aTODO(grpc):— recommend filing a tracked follow-up before this PR merges so it doesn't ossify across all 8 future migration PRs. gateway.Handlersignature returns(http.Handler, func(), error)— typedCleanupnamed return or a smallGatewaystruct withHandler()/Close()reads cleaner than a bare func.internal/grpcsrv/server.go:81-83: reflection registration uses the v1 reflection API (default forreflection.Register). gRPC reflection v1alpha is also supported but considered deprecated; depending on client tooling (older grpcurl) considerreflection.RegisterV1. Not blocking.MetaServiceGetMetarequest typeGetMetaRequest{}is empty — perfectly fine, but considergoogle.protobuf.Emptyfor clarity. Counter-argument: empty request type future-proofs adding query fields. Either is defensible.audit_log/required_permissions/publicproto options exist but no interceptor reads them (verified — no grep hits ininternal/grpcsrv/). The proto comments imply runtime enforcement ("read at runtime by the gRPC server"). Update the comment inannotations.prototo "will be read at runtime in a follow-up PR; currently metadata-only" so future contributors don't think the auth/permission interceptors already exist.EmitUnpopulated=true+UseProtoNames=true: this DOES align with the existing GraphQL response shape (which always emits selected fields), but it means tokens/secrets in responses (access_token: ""when absent) get rendered as empty strings rather than omitted. Mildly leaky-by-convention; acceptable, but worth a comment inmount.goexplaining the trade-off (the existing comment only justifiesUseProtoNames, notEmitUnpopulated).- The bufconn goroutine in
gateway.Handlerswallows theServeerror (_ = grpcSrv.Serve(lis)). At minimum log at warn level on non-ErrServerStoppedreturns.
Test coverage gaps
The vertical slice for meta is good. Beyond that:
- No interceptor tests. Recovery (a panicking handler →
codes.Internal), validate (a request that fails buf.validate →codes.InvalidArgument), logging (level selection by code). These are pure unit tests and should land in this PR — they protect every future service migration. - No stub-returns-Unimplemented test. A single
for service := range allServices { call one RPC, expect codes.Unimplemented }would catch a misregistration during a future migration (e.g. someone removes the embeddedUnimplemented*Serverwithout replacing it). - No health endpoint test.
grpc.health.v1.Healthis registered; verifyHealthCheckreturnsSERVINGfor empty service name and that shutdown flips it toNOT_SERVINGbeforeGracefulStopreturns. - No reflection on/off test. Construct a server with
EnableGRPCReflection=falseand assert reflection dial returnsUnimplemented. - No gateway-through-full-middleware test.
TestRESTMetamounts the gateway under a baregin.New(), bypassing CSRF, CORS, rate limit, security headers, ClientCheck. A test that mounts the gateway under the actual production router (i.e. exercisesserver.NewRouter()) and verifies (a) CSRF blocksPOST /v1/sessionswithoutContent-Type: application/json, (b) rate-limit returns 429 after N requests, (c) CORS preflight returns the expected headers — would catch any middleware-order regression. - No openapi.json test. Trivial GET test that verifies the file is served (and once it's
go:embed-ed per blocking item #3, that it's not byte-empty). - No port-collision test. A startup test where
--grpc-port == --metrics-portshould fail loudly (after fixing blocking item #2). - No TLS-required test. After blocking item #1 is fixed, a startup test that with
--grpc-insecure=falseand no cert/key the server fails to start with a clear error.
Recommendation
Request changes. The blocking items (TLS dead code, port collision default, swagger file missing in image, secrets in URL paths) are all small fixes individually but each is a real production issue. Once those are addressed I'd be happy to approve — the architecture is solid and the migration pattern set up here will scale to the remaining 8 services cleanly.
Summary
Lands the v1 public-API surface over gRPC and REST while keeping the existing GraphQL surface untouched. All nine proto services are authored and registered with the gRPC server.
MetaService.GetMetais fully implemented end-to-end as the vertical-slice proof; the other eight services are registered as Unimplemented stubs so gRPC reflection shows the complete API surface and REST `/v1/*` returns 501 for not-yet-migrated ops. Subsequent PRs migrate ops out of `internal/graphql` into `internal/service` one service at a time, swapping out the stubs.Proto (`proto/authorizer/*/v1`)
Toolchain (`proto/buf.gen.yaml`)
gRPC server (`internal/grpcsrv`)
REST gateway (`internal/gateway`)
CLI flags & wiring (`cmd/root.go`, `internal/config/config.go`)
Service migration (one op)
Test plan
What's still stubbed (codes.Unimplemented in this PR)
`UserService` (Create/Get/Update/Delete), `SessionService` (Create/Get/Delete/Validate), `MagicLinkService` (Create), `EmailVerificationService` (Create/Confirm), `PasswordResetService` (Create/Confirm), `OtpChallengeService` (Create/Confirm), `TokenService` (CreateValidation/RevokeRefresh), `AuthzService` (ListMyPermissions). Each gets a follow-up PR that migrates its op(s) from `internal/graphql` to `internal/service` and replaces the stub.
🤖 Generated with Claude Code